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STAT100 Trimester 2, 2019 
PART I 


Answer the following multiple choice questions (Questions 1-25) on the given 
MULTIPLE CHOICE ANSWER SHEET. Ensure that you enter the unit code 
(STAT100), your name and student number on the sheet. 


Question 1 [2 Marks] 


A researcher is interested in the views of Australian university students on campus 
protests. They have collected data by distributing questionnaires. A selection of Aus- 
tralian universities have been chosen, and questionnaires have been sent to 100 students 
from each of these. 


What type of sampling method has been implemented? 


1. Simple random sampling 
2. Stratified sampling 

3. Cluster sampling 

4. Multi-stage sampling 


5. None of the above 


This scenario applies to Questions 2, 3 and 4: The suitability of a new variety of 
cucumber for commercial growing in Australia is being investigated. Researchers have 
planted 12 fields, half with the market leading variety of cucumber, and half with the 
new variety. Average yield per hectare is measured in kg for each field. 


Question 2 [2 Marks] 


Identify the response and explanatory variables in this study: 


1. Old variety (explanatory), New variety (response) 

2. New variety (explanatory), Yield per hectare (response) 
3. Yield per hectare (explanatory), Variety (response) 

4. Variety (explanatory), Yield per hectare (response) 


5. Field (explanatory), Variety (response) 
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Question 3 [2 Marks] 


What are the cases/subjects in this study? 


1. Fields 

2. Cucumber varieties 
3. Seasons 

4. Hectare 


5. Yield per hectare 


Question 4 [2 Marks] 


Which plot is the most appropriate for exploring the data from this study? 


1. A scatterplot 
2. A boxplot 

3. A histogram 

4. A mosaic plot 


5. Side-by-side boxplots 
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Question 5 [2 Marks] 


A survey asked respondents to identify their age by choosing from the following: 


18-25 26-35 36-45 46-55 


What type of variable is this? 


1. Numeric 
2. Continuous 
3. Discrete 
4. Ordinal 


5. There isn’t enough information provided here to determine 


Question 6 [2 Marks] 


Which of the following examples involves paired data? 


1. The effectiveness of a new sunscreen product was tested on 50 random attendees 
at the swimming pool. They were all asked whether they preferred this sunscreen 
to their regular product. 


2. The effectiveness of a new sunscreen product was tested on 50 random swimmers 
and 50 random tennis players. The effectiveness between these two groups was 
compared. 


3. The effectiveness of a new sunscreen product was tested on 50 random attendees 
at the swimming pool. The new product was applied to one arm and their regular 
product was applied to the other arm. The relative protection between arms was 
compared. 


4. Sun exposure was compared between male and female swimmers. 


5. None of the above. 
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Question 7 [2 Marks] 


Which of the following statements best describes the scatterplot in Figure 1? 


Figure 1 


1. There is a weak relationship between Y and X. 

2. There is a strong relationship between Y and X. 

3. There is a strong linear relationship between Y and X. 

4. There is a positive non-linear relationship between Y and X. 


5. There is a positive linear relationship between Y and X. 


Question 8 


STAT100 Trimester 2, 2019 


[2 Marks] 


Which one of the histograms in Figure 2 most likely matches the boxplot in Figure 3? 


Figure 2 


2.3 
3. C 
4. It could be any of A,B or C 


5. None of the above 


Figure 3 
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Question 9 [2 Marks] 


Which of the following best describes the standardised z-score for an observation? 


1. It is the most common score for that type of observation. 
2. It is the standard deviation for that observation. 


3. It is the number of standard deviations the observation falls above or below the 
mean. 


4. It is how many observations that observation falls above or below the mean. 


5. None of the above. 


Question 10 [2 Marks] 


A and B are disjoint events. Which of the following probability statements are true: 


1. P(A or B) = P(A) + P(B) 
2. P(A and B) = P(A) x P(B) 

3. P(A) + P(B) =1 

4. P(A and B) = P(A|B) x P(B) 


5. None of the above 
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This scenario applies to Questions 11 and 12: The effectiveness of a new treat- 


ment for insomnia has been investigated, with some study participants receiving the 


new treatment, while others received no treatment. Whether the participants saw an 


improvement or not after 3 weeks was recorded. The results are presented in Table 1. 


Table 1: Contingency table for insomnia study 


No improvement | Improvement | Total 
No treatment af 23 50 
Treatment 38 62 100 
Total 65 85 150 


Question 11 


[2 Marks] 


What is the probability that someone received no treatment and saw no improvement? 


1. 0.180 
2. 0.540 
3. 0.415 
4. 0.144 


5. None of the above 


Question 12 


[2 Marks] 


What is the probability that someone who received treatment saw an improvement? 


1. 0.418 
2. 0.620 
3. 0.765 
4. 0.378 


5. None of the above 
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Question 13 [2 Marks] 


The height of adult women is assumed to follow a normal distribution with a 4 = 162cm 
and o = 5cm. Petra is 172cm tall. Which of the following best describes the percentile 
Petra falls into? 


1. P(X < 172) where X ~ N(162,5) 
2. P(X > 172) where X ~ N(162, 5) 
3. P(X =172) where X ~ N(162, 5) 
4. P(X <172) where X ~ N(162, =-) 


5. P(X > 172) where X ~ N(162, =) 


Question 14 [2 Marks] 


Which of the following is NOT a condition to check for when using the binomial 
distribution? 

1. The trials are independent. 

2. The number of trials, 7, is fixed. 

3. Each trial outcome can be classified as a success or a failure. 

4. The probability of a success, p, is the same for each trial. 


5. The trials follow a normal distribution. 
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Question 15 [2 Marks] 


Which of the following statements is true for a 95% confidence interval for the mean? 


1. The confidence interval includes 95% of the observations. 


2. If we repeat the sampling 100 times, approximately 95 of the population means 
will be included in the confidence interval. 


3. If we repeat the sampling 100 times, approximately 95 of the confidence intervals 
will include the population mean. 


4. With 95% probability, the confidence interval will include the population mean. 


5. None of these are correct. 


Question 16 [2 Marks] 


The commuting times of staff and students travelling to UNE Armidale campus has 
a mean of 4 = 10 minutes and variance o? = 9 minutes. A sample of 20 randomly 
selected staff and students has been obtained. What is the standard error of the sample 
mean, %? 


1. 0.45 
2.18 
3. 2.01 
4. 0.67 


5. 3 
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Question 17 [2 Marks] 


Find the median of the following sample: 


16 17 20 22 23 25 29 30 32 33 35 37 


2. 26.58 
3. 27 
4, 29 


5. None of the above 


Question 18 [2 Marks] 


A sampling distribution is the probability distribution for which one of the following? 


1. A sample 

2. A sample statistic 

3. A population 

4. A population parameter 


5. A population statistic 
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This scenario applies to Questions 19 and 20: A conservation officer wants to 
determine whether adult possums have a preference for eating over ripe or under ripe 
fruit. The officer uses a random sample of 39 adult possums within the conservation 
park and randomly assigns equal numbers of possums to the following feeding regimes: 
(i) over ripe fruit only, (ii) under ripe fruit only, or (iii) a mix of both under and over 
ripe fruit. The total mass of fruit consumed by each group is compared to determine 
preference. 


Question 19 [2 Marks] 


In reference to this study, which of the following is correct in all elements? 


1. Randomized experiment with a control, a placebo, and double-blinding 
2. A prospective observational study with a placebo and randomization 
3. Randomized experiment with blocking, a placebo, and a control. 

4. A retrospective observational study with blocking and a control. 


5. None of the above are completely correct. 


Question 20 [2 Marks] 
Which method should be used to answer the research question for the study outlined 


above: Do adult possums have a preference for over ripe or under ripe fruit? 


1. Paired t-test 

2. 2 sample t-test 
3. Linear regression 
4. ANOVA 


5. Chi-squared test of independence 
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[2 Marks] 


Which one of the following scenarios describes a problem for which a simple linear 


regression would be appropriate? 


1. Comparing the number of words recalled in a memory test after a student reads 


either hand-written or typed word lists. 


2. Analysing the relationship between jump height and mass of a rugby player. 


3. The proportion of students from different colleges who support having an all-night 


pizza cafe on campus. 


4. Comparing the mean circumference of apples of four different varieties. 


5. Exploring whether there is an increase in endurance of individuals before and 


after undertaking a 6-week training program. 


Question 22 


[2 Marks] 


Which correlation coefficient seems to match the scatterplot in Figure 4? 


1. 0.4 
2 et 
3. -0.4 
4. -0.7 


5. None of the above 


Figure 4 
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Question 23 [2 Marks] 


The amount of time within a one hour basketball practice session that was spent 
focusing on free-throws was compared between the top ranked team (Blazers) and the 
fifth ranked team (Chargers) in the local basketball league. Twelve practice sessions 
from each team were randomly chosen for review. The number of minutes was recorded 
as a decimal. Assuming that the necessary conditions have been met, which of the 
following is the best conclusion, based on the R output in Table 2? 


Table 2: Results of a two sample t-test on basketball practice time 


Welch Two Sample t-test 

data: basketball t = 0.875, df = 21.5, p-value = 0.392 
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval: 

-4.0 9.8 

sample estimates: mean Blazers mean Chargers 

21.8 18.9 


1. We can conclude that the mean time spent practicing free-throws by the Chargers 
(Z = 18.9) is significantly less than that by the Blazers (z = 21.8). 


2. We can conclude that there is a significant difference in the mean time spent 
practicing free-throws by the two teams as the p-value=0.392 is significant. 


3. We cannot conclude that there is a significant difference in the mean time spent 
practicing free-throws between the two teams as the 95% CI for the difference in 


means contains 0. 


4. We cannot conclude that there is a significant difference in the mean amount of 
time practicing free-throws between the two teams as the difference in sample 
means (21.8 - 18.9 = 2.9) is contained in the 95% CI for the difference in means. 


5. None of the above. 
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Question 24 [2 Marks] 


Which of the following is NOT a condition for ANOVA? 


1. Equal group sizes 

2. Nearly normal residuals 

3. Independent observations 

4. Equal variances for all groups 


5. All of these are conditions for ANOVA 


Question 25 [2 Marks] 


Which of the following statements is correct with regard to the null hypothesis, Ho? 


1. Ho represents the research question that a researcher is interested in testing. 


2. Ho seeks to find significant differences or relationships between the variables of 
interest. 


3. Ho cannot be tested. 
4. Ho represents the status quo of no differences or relationships being found. 


5. Ho is often represented by a range of possible parameter values. 
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Answer the following questions in the space provided. 
If you run out of space you may request an additional answer book. 
Question 26 [10 Marks] 


A study has been carried out to investigate the relationship between exposure to light 
at night and weight gain in mice, because it is suspected that even low level light at 
night can interfere with normal eating and sleeping cycles. 


Mice were randomly assigned to a group that received complete darkness at night time, 


or a group which was exposed to bright light all night long, with 9 mice in each group. 


(a) Should a paired t-test or a 2-sample t-test be used for this study? Explain your 
answer [2 Marks] 


(b) What is the null and alternative hypotheses being tested? State these using plain 
language and mathematical notation. [4 Marks] 


Question 26 is continued on page 17 
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(c) Does the sample provide evidence to indicate that light at night has an effect on 
weight gain in mice? With reference to the appropriate table of R output (Tables 
3 and 4), carry out a hypothesis test, justifying your conclusion. [4 Marks] 


Table 3 


Welch Two Sample t-test 


data: Dark and Light 
t = -4.5385, df = 10.939, p-value = 0.0008579 


alternative hypothesis: true difference in means is not equal to 0 


Table 4 


Paired t-test 


data: Dark and Light 
t = -3.7424, df = 8, p-value = 0.005685 


alternative hypothesis: true difference in means is not equal to 0 
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Question 27 [6 Marks] 


Monensin is a feed additive often given to cattle to aid (amongst other things) weight 
gain. A researcher is interested in whether an alternative feed additive, lasalocid, is 
a good alternative to monensin in improving weight gain in cattle. A study involving 
60 cows has been conducted, where each cow is randomly assigned to receive either 
monensin or lasalocid for a period of 4 weeks. Weight is measured at the start and 
end of the 4 week period, as well as the weight of the total food consumed during the 


study. 
(a) What type of study is this? [2 Marks] 
(b) What is the response variable? [2 Marks] 
(c) Identify a possible confounding variable and explain why. [2 Marks] 
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Question 28 [12 Marks] 


New Zealand native frogs live for a long time and continue to grow throughout their 
lives. One hundred frogs (Leiopelma pakeka) were translocated from two sites (site 1 
and site 2) on an island to a new habitat (site 3). The weight gains over a 12 year period 
were compared to see whether the new habitat was as good as, or better than, the old 
habitats. Numerical and graphical summaries of the data are provided in Figure 5 & 
Table 5, and the R analysis is given in Table 6 below. 


wign 


Figure 5 


Table 5 


mean sd 0% 25% 50% 75% 100% on 
11.65 1.19 0.1 0.59 1.66 2.30 4.20 14 
22.02 1.59 -1.7 1.19 1.88 3.13 5.97 30 
3 3.65 1.45 2.3 2.50 3.60 4.12 6.40 7 


Table 6 


Response: wtgn 

Df Sum Sq Mean Sq F value Pr(>F) 
site 2 19.81 9.91 4.57 0.015 
Residuals 48 104.05 2.17 


Question 28 is continued on page 20 
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(a) Following the steps of hypothesis testing, test for a difference in mean weight 
gain among the three sites. [6 Marks] 


(b) Showing all calculations, verify that a 95% confidence interval for the mean weight 
gain for site 1 is (0.85, 2.44). Use t* = 2.01. [3 Marks] 


Question 28 is continued on page 21 
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(c) The 95% Cls for the mean weight gain at the other two sites are given in Table 7. 
With reference to the confidence intervals for all 3 sites, write an informative 
conclusion related to the research question of interest. [3 Marks] 


Table 7 


Ws ole eS Fe ole | 
sitel 0.85 2.44 
site2 1.48 2.56 
site3 2.53 4.77 
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Question 29 [14 Marks] 


A veterinary drug company wants to check at what time an animal medication (mox- 
idectin) will be present at safe levels for human consumption in the meat of animals 
treated with the medication. The moxidectin concentration (wg/kg) in the meat of 
sheep over the period of testing (days) are shown in the plot below. 


(a) Add the axis labels to the plot below and identify which is the explanatory 
variable and which is the response variable. Draw an appropriate regression line 
to model the association between drug concentration and time on Figure 6. 


[3 Marks] 


000 0 


Figure 6 


(b) Summarise the type of relationship shown in Figure 6 (direction, strength of 
relationship, etc). [2 Marks] 


Question 29 is continued on page 23 
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(c) A number of conditions must be met for a linear regression to be valid. Which 
condition/s, if any, do NOT appear to have been met in Figure 6. Explain. 


[2 Marks] 


(d) A linear regression model was fitted using log(concentration) of moxidectin as 
the response variable. Assuming all conditions have been met, use the R output 
in Table 8 to answer the following questions. 


Table 8 


Call: 


lm(formula = log(concentration) ~ day) 


Coefficients: 

Estimate Std. Error t value Pr(>|tl) 
(Intercept) 6.772728 0.105108 64.436 < 2e-16 
day -0.044413 0.004699 -9.452 8.1e-09 


Residual standard error: 0.3284 on 20 degrees of freedom 
Multiple R-squared: 0.8171, Adjusted R-squared: 0.8079 
F-statistic: 89.33 on 1 and 20 DF, p-value: 8.098e-09 


(i) Write the estimated linear regression equation demonstrating the relation- 
ship between log(concentration) in sheep and day. 


[2 Marks] 


Question 29 is continued on page 24 
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(ii) Provide an accurate estimate of the exact point in time the drug company 
could assume that moxidectin has depleted to the safe log(concentration) of 
5.7038 log(wg/kg) in the meat of sheep. Show your working. [2 Marks] 


(iii) The 95% confidence interval for the slope of the least squares regression line 
from Table 8 is (-0.0542, -0.3461). Provide a meaningful interpretation of 
this value in relation to the association between moxidectin log(concentration) 
and day. [3 Marks] 
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Question 30 [8 Marks] 


Researchers are interested in whether the career someone chooses is associated with 
their dominant hand/s. A sample of Americans from a variety of professions were asked 
if they are left or right hand dominant, or ambidextrous. The results for 5 professions 
are shown in Table 9. 


Table 9 

Right handed | Left handed | Ambidextrous | Total 
Psychiatrist 101 10 7 118 
Architect 115 26 7 148 
Orthopaedic surgeon 121 5 6 132 
Lawyer 83 16 6 105 
Dentist 116 10 6 132 
Total 536 67 32 635 


The sample was analysed using RStudio. The output from the analysis is presented in 
Table 10. 


Table 10 


Pearson’s Chi-squared test 


data: table 
X-squared = 19.019, df = XX, p-value = 0.01476 


Question 30 is continued on page 26 
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(a) What is the expected number of ambidextrous orthopaedic surgeons? Show all 
working. [2 Marks] 


(b) What are the degrees of freedom for this test? Show all working. [2 Marks] 


(c) What conclusion can be made about dominant handedness for these professions? 


[4 Marks] 
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Formulae are given on page 28 


Please remember: This examination question paper MUST BE HANDED IN. Failure 
to do so may result in the cancellation of all marks for this examination. Writing your 
name and number on the front will help us confirm that your paper has been returned. 
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Formulae 


P(A and B) 


P(AIB) = “om 


P(A|B) = P(A) if A, B are independent 


P(A and B) = P(A) x P(B) if A, B are independent 
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